Skip to content

Conversation

@ggerganov
Copy link
Member

The AMX backend produces garbage with Q4_0 and Q4_1 quantizations:

https://github.com/ggml-org/llama.cpp/actions/runs/18070620247/job/51419855639#step:3:13646

Repro:

./bin/llama-perplexity -hf ggml-org/Qwen3-0.6B-GGUF:Q4_0 -f wikitext-2-raw/wiki.test.raw -c 2048 -b 2048 --chunks 2

The only problem that I was able to find using the address sanitizer is this unaligned access of the quants. However, this does not fix the incorrect results - there is some remaining issue somewhere.

cc @mingfeima

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 28, 2025
if (op->op == GGML_OP_MUL_MAT && is_contiguous_2d(op->src[0]) && // src0 must be contiguous
is_contiguous_2d(op->src[1]) && // src1 must be contiguous
op->src[0]->buffer && op->src[0]->buffer->buft == ggml_backend_amx_buffer_type() &&
op->src[0]->ne[0] % (TILE_K * 2 * 32) == 0 && // TODO: not sure if correct (https://github.com/ggml-org/llama.cpp/pull/16315)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this fixes the CI workflows, though I'm not sure about the reasoning. It effectively requires the row size to be multiple of 2048.

@ggerganov ggerganov marked this pull request as ready for review September 29, 2025 07:38
@ggerganov ggerganov requested a review from slaren as a code owner September 29, 2025 07:38
@Gadflyii
Copy link
Contributor

Gadflyii commented Oct 2, 2025

I could not reproduce this error with:

build/bin/llama-perplexity -m /mnt/ssd2/AI/Qwen3_30B/Q4_0/Qwen3-30B-A3B-Thinking-2507-Q4_0.gguf -f /tmp/wikitext-2-raw/wiki.test.raw -c 2048 -b 2048 --chunks 2

Any suggestions?

What garbage are you getting out of the AMX backend?

@ggerganov ggerganov merged commit a23b9bd into master Oct 6, 2025
65 of 67 checks passed
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants